AITopics | shallow relu network

8b8fe72f3193fe78ac353ebcc686b395-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 15:16:52 GMT

activation pattern, dataset, gradient descent, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs

Neural Information Processing SystemsDec-25-2025, 01:40:40 GMT

We prove that, for the fundamental regression task of learning a single neuron, training a one-hidden layer ReLU network of any width by gradient flow from a small initialisation converges to zero loss and is implicitly biased to minimise the rank of network parameters. By assuming that the training points are correlated with the teacher neuron, we complement previous work that considered orthogonal datasets. Our results are based on a detailed non-asymptotic analysis of the dynamics of each hidden neuron throughout the training. We also show and characterise a surprising distinction in this setting between interpolator networks of minimal rank and those of minimal Euclidean norm. Finally we perform a range of numerical experiments, which corroborate our theoretical findings.

dynamic and implicit bias, name change, shallow relu network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Neural Information Processing SystemsDec-24-2025, 18:11:12 GMT

The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern followed by a constrained convex problem given a fixed set of activation patterns. We explore the implications of this combinatorial aspect of ReLU optimization in this work. We show that it can be naturally modeled via a geometric and combinatoric object known as a zonotope with its vertex set isomorphic to the set of feasible activation patterns. This assists in analysis and provides a foundation for further research. We demonstrate its usefulness when we explore the sensitivity of the optimal loss to perturbations of the training data. Later we discuss methods of zonotope vertex selection and its relevance to optimization. Overparameterization assists in training by making a randomly chosen vertex more likely to contain a good solution. We then introduce a novel polynomial-time vertex selection procedure that provably picks a vertex containing the global optimum using only double the minimum number of parameters required to fit the data. We further introduce a local greedy search heuristic over zonotope vertices and demonstrate that it outperforms gradient descent on underparameterized problems.

combinatorial perspective, name change, optimization, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.60)

Add feedback

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

Neural Information Processing SystemsDec-24-2025, 14:31:35 GMT

The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.

gradient flow dynamic, shallow relu network, square loss and orthogonal input, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Neural Information Processing SystemsAug-16-2025, 19:59:48 GMT

This assists in analysis and provides a foundation for further research.

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Neural Information Processing SystemsMay-27-2025, 14:01:55 GMT

The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern followed by a constrained convex problem given a fixed set of activation patterns. We explore the implications of this combinatorial aspect of ReLU optimization in this work. We show that it can be naturally modeled via a geometric and combinatoric object known as a zonotope with its vertex set isomorphic to the set of feasible activation patterns. This assists in analysis and provides a foundation for further research. We demonstrate its usefulness when we explore the sensitivity of the optimal loss to perturbations of the training data.

activation pattern, combinatorial perspective, shallow relu network, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.43)

Add feedback

Finite Samples for Shallow Neural Networks

Xia, Yu, Xu, Zhiqiang

arXiv.org Artificial IntelligenceMar-16-2025

This paper investigates the ability of finite samples to identify two-layer irreducible shallow networks with various nonlinear activation functions, including rectified linear units (ReLU) and analytic functions such as the logistic sigmoid and hyperbolic tangent. An ``irreducible" network is one whose function cannot be represented by another network with fewer neurons. For ReLU activation functions, we first establish necessary and sufficient conditions for determining the irreducibility of a network. Subsequently, we prove a negative result: finite samples are insufficient for definitive identification of any irreducible ReLU shallow network. Nevertheless, we demonstrate that for a given irreducible network, one can construct a finite set of sampling points that can distinguish it from other network with the same neuron count. Conversely, for logistic sigmoid and hyperbolic tangent activation functions, we provide a positive result. We construct finite samples that enable the recovery of two-layer irreducible shallow analytic networks. To the best of our knowledge, this is the first study to investigate the exact identification of two-layer irreducible networks using finite sample function values. Our findings provide insights into the comparative performance of networks with different activation functions under limited sampling conditions.

artificial intelligence, machine learning, neuron, (16 more...)

arXiv.org Artificial Intelligence

2503.12744

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Neural Information Processing SystemsJan-17-2025, 15:42:43 GMT

The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern followed by a constrained convex problem given a fixed set of activation patterns. We explore the implications of this combinatorial aspect of ReLU optimization in this work. We show that it can be naturally modeled via a geometric and combinatoric object known as a zonotope with its vertex set isomorphic to the set of feasible activation patterns. This assists in analysis and provides a foundation for further research. We demonstrate its usefulness when we explore the sensitivity of the optimal loss to perturbations of the training data.

activation pattern, combinatorial perspective, shallow relu network, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.43)

Add feedback

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs

Neural Information Processing SystemsJan-17-2025, 12:52:41 GMT

We prove that, for the fundamental regression task of learning a single neuron, training a one-hidden layer ReLU network of any width by gradient flow from a small initialisation converges to zero loss and is implicitly biased to minimise the rank of network parameters. By assuming that the training points are correlated with the teacher neuron, we complement previous work that considered orthogonal datasets. Our results are based on a detailed non-asymptotic analysis of the dynamics of each hidden neuron throughout the training. We also show and characterise a surprising distinction in this setting between interpolator networks of minimal rank and those of minimal Euclidean norm. Finally we perform a range of numerical experiments, which corroborate our theoretical findings.

correlated input, dynamic and implicit bias, shallow relu network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

Neural Information Processing SystemsJan-15-2025, 16:35:01 GMT

The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.

gradient flow dynamic, shallow relu network, square loss and orthogonal input, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Filters

Collaborating Authors

shallow relu network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

8b8fe72f3193fe78ac353ebcc686b395-Paper-Conference.pdf

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Finite Samples for Shallow Neural Networks

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs